Goto

Collaborating Authors

 biologically relevant mechanism


R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing Systems

In real-world settings, we repeatedly decide whether to pursue better conditions or to keep things unchanged. Examples include time investment, employment, entertainment preferences etc. How do we make such decisions? To address this question, the field of behavioral ecology has developed foraging paradigms - the model settings in which human and non-human subjects decided when to leave depleting food resources. Foraging theory, represented by the marginal value theorem (MVT), provided accurate average-case stay-or-leave rules consistent with behaviors of subjects towards depleting resources. Yet, the algorithms underlying individual choices and ways to learn such algorithms remained unclear.


Review for NeurIPS paper: R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing Systems

Weaknesses: More attention should be paid for teasing out differences between V and R learning, with intermittent initial rewards being essentially the only example. Although it is impressive that new VTA recording data is presented in the paper, I don't feel that the result is particularly helpful - it only shows that VTA activity doesn't contradict R-learning model, but it does not really provide specific support for it. It should be possible to design different tasks/protocols under which the two formalisations would have substantially different TD errors, which could help tease out biological correlates of the two models. Furthermore, it would be nice to see more details of parameter estimation and the resulting best-fitting parameter values, which if done properly, may allow to achieve not only a qualitative but also a better quantitative fit between Figure 1E and Figure 1D (as well as between Figure 1D and Figure 1B). As the models have multiple parameters substantially affecting performance, the two models should be compared under best-fitting parameters and should include formal measures like AIC, not just qualitative fits. Of course model universality regardless of parameters is helpful, but quantitative fit is equally important.


Review for NeurIPS paper: R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing Systems

This is a well-written and presented paper proposing a new framework for modeling animal behavior during a foraging task, and should be of interest to the NeurIPS audience. After rebuttal, 3 of the reviewers recommended accept based on it providing a nice link between the behavioral economics and reinforcement learning communities, and its strengths in both theory and empirical results. Therefore, I tentatively recommend accept. That said, during the discussions some concerns were brought up regarding some missing related work. I urge the authors to consider discussing in their final version several related works that R4, and I think are quite relevant: Daw et al, 2002, Neural Networks; Schwighofer & Doya 2003, Neural Networks; Niv et al 2006/2007 (and related), and also some works from motivation modeling literature (that R2 mentions in their review).


R-learning in actor-critic model offers a biologically relevant mechanism for sequential decision-making

Neural Information Processing Systems

In real-world settings, we repeatedly decide whether to pursue better conditions or to keep things unchanged. Examples include time investment, employment, entertainment preferences etc. How do we make such decisions? To address this question, the field of behavioral ecology has developed foraging paradigms – the model settings in which human and non-human subjects decided when to leave depleting food resources. Foraging theory, represented by the marginal value theorem (MVT), provided accurate average-case stay-or-leave rules consistent with behaviors of subjects towards depleting resources. Yet, the algorithms underlying individual choices and ways to learn such algorithms remained unclear.